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Abstract. We introduce the Statistical Asynchronous Regression (SAR) method: 
a technique for determining a relationship between two time varying quantities without 
simultaneous measurements of both quantities. We require that there is a time invariant, 
monotonic function Y = u(X) relating the two quantities, Y and X. In order to determine 
u(X), we only need to know the statistical distributions of X and Y. We show that u(X) is 
the change of variables that converts the distribution of X into the distribution of Y, while 
conserving probability. We describe an algorithm for implementing this method and apply it 
to several example distributions. Wc also demonstrate how the method can separate spatial 
and temporal variations from a time series of energetic electron flux measurements made by 
a spacecraft in geosynchronous orbit. We expect this method will be useful to the general 
problem of spacecraft instrument calibration. We also suggest some applications of the SAR 
method outside of space physics. 
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1. Introduction 

We developed the Statistical Asynchronous Regression (SAR) technique described in this 
paper as part of a study of relativistic electron conditions at geosynchronous orbit. This part of 
the Earth's radiation belts can evolve on a timescale of hours or even minutes. Unfortunately, 
while individual satellites may make measurements every few seconds, it is difficult to separate 
the temporal changes from consequences of orbital motion. The easiest way to do this would 
be to have continuous measurements at a fixed location, or local time, such as local noon. 
Instead, we have continuous measurements on board moving spacecraft. We can remove the 
orbital effects if we can map our continuous measurements to local noon at geosynchronous 
orbit. 

Relativistic electrons in the vicinity of geosynchronous orbit drift around the earth every 
5-15 minutes under the influence of the local magnetic field. As it happens, these electrons do 
not follow circular paths like satellite orbits, but rather elliptical paths that depend on the 
details of the local magnetic field geometry. However, because electron density is a relatively 
smooth function of altitude near geosynchronous orbit, measurements at different local times 
are strongly correlated. This correlation is stronger still if we average our data over several 
drift periods. The strong correlation suggests that we can map our continuous measurements 
to local noon, if we can determine the right mapping function. 

Sometimes it is possible to determine empirical mappings between measurements at 
different local times by regression of simultaneous measurements. For example, it is possible 
to relate measurements made by the GOES 8 spacecraft at local dawn (0600) to GOES 9 
measurements at local 10 AM (1000), because whenever GOES 8 is at local dawn, GOES 9 is 
at local 10 AM. However, it is never the case that GOES 8 is at local dawn when GOES 9 is 
at local noon. Therefore, we need some method for mapping measurements from anywhere to 
local noon (or some other local time of interest). Until recently, there have been three strategies 



for resolving this difficulty: interpolate between multiple calibrated spacecraft \ Reeves et al. 



1998 1, use the equation of motion of electrons in model electromagnetic fields to follow particles 



around geosynchronous orbit \Friedel et al., 1999], or use some kind of empirical description 
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of the orbital variations [Brautigam et al, 1992; Vette, 1991|. The first approach degrades 
substantially when only a few spacecraft are available, and fails when only one spacecraft is 
available. The second approach suffers from the substantial imperfections in our magnetic field 



models near geosynchronous [Selesnick and Blake, 2000 1. The third approach has been applied 
with encouraging success by Moorer [1999], who uses whatever measurements are available 
to adjust the CRRESELE empirical radiation belt model for best agreement. The SAR 
technique provides us with a more robust approach that can be applied in cases when there 
is no pre-existing empirical model like CRRESELE. The SAR technique calibrates not only 
between spacecraft and instruments but also between different locations (local times) around 
geosynchronous orbit. One can easily imagine the SAR technique as calibrating measurements 
made by GOES 8 at local dawn to measurements made by GOES 9 at local noon-even though 
these two spacecraft have never been at these locations simultaneously. Additionally, the SAR 
technique is non-parametric because it does not require us to assume a functional form for the 
mapping between local times. 

When we have described the SAR technique to our colleagues, many have found it novel 
and challenging to understand, and some have stated that it might be useful in their own 
work on other problems. For our own purposes, since we have used this technique as the basis 
of a statistical study of the energetic electrons near geosynchronous orbit, we present this 
technique to familiarize our audience with the technique and to demonstrate its robustness. 
As we believe the SAR technique has applications beyond the electron radiation belts, we have 
chosen to dedicate this paper entirely to the technique itself, reserving the radiation belt study 
to a later publication. 

In essence, our method provides a means of performing a regression of one time varying 
quantity against another without requiring simultaneous knowledge of both. We call this 
the Statistical Asynchronous Regression (SAR) method, because it allows us to regress Y{t) 
against X{t) using only the two statistical distributions F{x) and G{y). The SAR method 
determines the function Y = u{X) by matching the quantiles (or percentiles) x and y of the 
distributions of X and Y for each probability level. A primitive variant of this technique 
was developed to standardize the calculation of K indices at different magnetic observatories 
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' \Mayaud, 1980| , and references therein] . We also note that a transformation similar to the S AR 
method has been introduced to map non-Gaussian random variables onto Gaussian ones, with 
application to the construction of multivariate distribution functions in high-energy particle 



physics experiments [Karlen, 1998 1, in the theory of portfolio in Finance [Sornette et al, 



2000 1, and earlier in the treatment of bivariate gamma distributions \ Moran, 1969(| . 

In statistics, one method of graphical hypothesis testing is the Q-Q (quantile-quantile) 



plot \Wilk and Gnanadesikan, 1968], which is essentially a graphical depiction of u{X) based 
on the same principle as the SAR method. A linear u{X) indicates that the two variables differ 
only by a scaling and an offset but are otherwise identically distributed. However, in spite of 
the variety of graphical techniques related to the SAR method, none makes use of the plotted 



u{X), aside from determining whether it is linear \Fisher, 1983[. Since we are specifically 
interested in potentially nonlinear u{X), we have developed the SAR method as an extension 
to the Q-Q plot. 

Under various names, such as anchoring or the equipercentile method, psychological and 
educational testing use the same principle as the SAR technique to normalize a new test to a 
standard score distribution [Allen and Yen, 1979|. However, u{X) is not explicitly calculated, 
and the information it contains is typically discarded. 

Additionally, the Spearman rank order correlation coefficient touches on the same notion 
as the SAR method jPress et al., 1992 ]. It calculates a linear correlation coefficient between 
the sorted rank orders of two quantities rather than the quantities themselves; this coefficient 
measures the quality of the optimal nonlinear mapping between two simultaneously measured 
quantities. Since we are concerned with comparing quantities not measured simultaneously, 
we will not make use of the Spearman coefficient. 

In the remainder of this paper, we will provide a description and some limited analysis 
of the SAR method. First, we will describe the technique by parable, using a graphical 
illustration. Next we will provide the formal derivation of the technique. We will provide 
several examples and a simple recipe for the implementation of the SAR technique. Then we 
will address the problems of finite sample size and noisy measurements. Finally we will show 
how we use the SAR method to map geosynchronous energetic electron flux from one local 
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time to another. 

2. A Simple Example 

We begin our explanation of the SAR technique by taking a step back from space physics 
to a simpler analogous problem. Suppose we have two meteorologists making measurements 
every other day. One has been measuring his favorite meteorological quantity X, and the other 
has been measuring Y . Unfortunately, owing to an error in scheduling, the two meteorologists 
have not been making their measurements on the same days. It is therefore impossible for 
them to plot Y against X and perform a regression. We will show how it is nonetheless 
possible for them to recover the empirical function Y = u{X). The powerful statistical tool 
that will make this possible is the fundamental principle that probability is conserved under 
a change of variables. We will leave the mathematical presentation of this principle to later 
sections. 

In Figure |l], we have plotted the probability density functions (PDFs) f{x) and g{y) 
along the x- and y-axes respectively. For clarity, we have plotted f{x) upside down and g{y) 
rotated counterclockwise. Each density function represents the distribution of observations 
made by one of the scientists. In this example, X is distributed uniformly between 1 and 2, 
and Y is distributed as l/y between e and e^. We have also plotted the relational function 
Y = u{X) = that provides the change of variables. The shaded area within f{x) is the 
probability that a single measurement of X falls between xi and X2- Similarly, the shaded 
area within g{y) is the probability that a single measurement of Y falls between yi = u{xi) 
and y2 = u{x2)- The conservation of probability is illustrated graphically by the fact that the 
two shaded regions are equal in area. With any two of these three curves, it is possible to 
determine the third. Generally, it has been of greater interest to reconstruct g{y) knowing 
f{x) and u{X). We, however, are interested in reconstructing u{X) knowing only f{x) and 
g{y). The fundamental assumption is that of stationarity: the unknown relationship Y = u{X) 
is the same at all times; this condition must be met for a statistical approach to be possible. 

One can reconstruct Y = u{X) for each X simply by finding the value Y such that the 
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area inside g{y) from — oo to Y is equal to the area inside f{x) from —cxd to X. In Figure ^ Figure ^ 

we demonstrate this cumulative way of looking at the problem. Instead of plotting the density 

functions f{x) and g{y), we have plotted the cumulative distribution functions (CDFs) F{x) 

and G{y). The CDFs are the integrals from — oo to x of f{x) and — oo to y of g{y), and 

they correspond to the areas inside f{x) and g{y) mentioned above. To find the Y that 

corresponds to a given X in Figure ^, one reads from the X value on the abscissa up to F{x) 

then horizontally over to the same value of G{y), and back down to the abscissa to find the 

corresponding Y. Compared to Figure ||, this visualization makes it easier to find Y for a 

given X, but does not provide an obvious representation of u{X). While emphasizing different 

features of the method, these two graphical representations of the method give identical 

results. In the following sections, we will provide the formal mathematical treatment of the 

graphical operations. 



3. Formalism 

Some of our readers will no doubt be a bit rusty in the manipulation of probabilities. 
Therefore, we have included a thorough treatment of the change of variables theorem in an 
appendix. Here, we begin with the differential form of the change of variables: 

f{x)dx = g{u{x))\u' {x)\dx 

dx = g{y)\dy\. (1) 

In order to use this equation, we must determine the sign of u'{x). For distributions with 
only one tail, we can do this rather easily by examining the rare values of X and Y . When 
the rare values of X and Y fall at the same end of the real number line, u'{x) is positive. 
When they fall at opposite ends, u'{x) is negative. Physical insight is also a useful tool in 
determining the sign of u'{x). If we expect larger (or more positive) values of X to correspond 
to larger values of Y, then u'{x) is positive. If we expect larger values of X to correspond to 
smaller (or more negative) values of Y, then u'{x) is negative. 



9iy) 



dy 



dx 
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For u'{x) > 0, we can integrate 



f{x')dx' = / g{y')dy'. (2) 



oo 



This equation implicitly defines y = u{x) as the function that provides the matching integration 
bounds. We recognize these integrals as the CDFs of X and Y, so we can rewrite (Q) as 

F{x) = G{y) for u {x) > 0. (3) 

We can invert G{y) to arrive at an explicit equation for u{x), 

y = G-^ (Fix)) = u{x). (4) 

This equation represents the mathematical counterpart to the graphical operation described 
in Figure |2|, where one moves up from X to F{x), then across to G{y), then back down to the 
corresponding Y. 

For u'{x) < 0, we can integrate (||), 

f + OO 

f{x')dx' = / g{y')dy'. (5) 



D Jy 

Converting this equation to CDFs, we have 

F{x) = 1 - G{y) for u'{x) < 0. (6) 

solving for u{x), we arrive at 

y = G-^{l-F{x)) = u{x). (7) 

Combining (^) and (|^) we arrive at 

[ G-\F{x)) for n'(x)>0, 
u[x) = I (8) 

[ G-^{l-F{x)) for u'{x) < 0. 

It is clear, then, that all we need to determine u{x) is knowledge of the sign of u'{x) and 
either F{x) and G(y) or f{x) and g{y). We summarize the desirable properties of u{x) as 
follows: 
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• it can be arbitrarily nonlinear; 

• its determination is not parametric; 

• it maps the entire distribution and all of the moments of X onto those of Y; 

• it can be determined without simultaneous measurements of X and Y. 

4. More Examples 

We now turn to some more sophisticated examples of the SAR method. First, we will 
return to our original meteorological example to demonstrate the SAR procedure on analytical 
functions. Then, we will provide a function relating a bimodal distribution to a Gaussian. 
Finally, we will demonstrate the method on a stretched exponential and a Gaussian. 



4.1. Meteorological Example 

In the example of the meteorologists, illustrated in Figures |l] and ^, the following 



analytical functions were used: 



/(^) 



9{y) 



1 for 1 < X < 2, 
otherwise, 

l/y for e < y < e^, 
otherwise. 



Using ( [A3[ ) and ( [A4|) together with (^) and (|To|), we have 



F{x) 



G{y) 



for X < 1, 

X - I for 1 < x < 2, 

1 for X > 2, 

for y < e, 

log y — 1 for e < y < , 

1 for y > e^. 



(9) 
(10) 



(11) 



(12) 
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Inserting (11) and ( p^ ) into we see that 

u{x) = G-\F{x)) = ei+^(^) = (13) 
Adding in the proper bounds, we have 

u(x) = for 1 < X < 2. (14) 

4.2. Bimodal Example 

In our next example, we will show how the SAR method easily handles bimodal 
distributions. We have chosen X to be bimodal and Y to be unimodal. The PDFs are 

f{x) = 

1 V^(--3)'+e-5(^-«)'V (15) 



2V27r 



1 lty-iO\^ 

a{y) = ■ (16) 



While there is no closed form for u{X), a graphical display can show its qualitative features. 



Figure ^ shows how the bimodal f{x) maps to g{y). The highly nonlinear mapping u{X) has Figure ^ 
a flat spot (with small but still positive slope) corresponding to the local minimum in /(x), 
since u'{x) = f {x) / g{u{x)) . In Figure ^, we see how a large range of X values near X = b 
maps to a very narrow range of Y values near Y = 10. More generally, the terraced shape of 
u{X) can be seen to generate bimodal or multimodal distributions from unimodal ones. 

4.3. Stretched Exponential Example 

For our final example, we will treat an unusual distribution and an unusual mapping. We 
consider the case of a stretched exponential mapped to a Gaussian. In this case, X and Y are 
distributed as 



r / T \ 9~1 -I ^1 

for X > 0, (17) 



0rxo \xq 



)/ 
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9{y) 



2 jy-t^f 
-e for y > fj,, 



(18) 



where c, o", and xq are positive real values. Using (||) and assuming u'{x) > 0, we can write a 
differential equation for u{x), 



y/nxo 



2 (u(x)-j,f 



-e 2^ u'ix). 



(19) 



By our design of ([T^)) u{x) will cause the two exponentials to drop out of the equation, 
satisfying the system 

{u{x) - 



2a2 



y/rrxo \xo 



X 



$-1 



ju'{x). 



Solving (pO[) for u{x) we have 



u[x 



V2a 



c 

X \ 2 



Xo 



(20) 
(21) 

(22) 



which is, in fact, the solution to (21) and thus of (|l^). This mapping function is a highly 
nonlinear power-law. In Figure ^, we have depicted the borderline case for c = 1, xq = 1, 
a = l/\/2 and fi = 0. For c < 1, this distribution becomes a stretched exponential, which is 
a common distribution in real data. While /(x) diverges at x = 0, the SAR method cleanly 
recovers the mapping function u{x) = 2^/x. We are now going to investigate the robustness of 
the SAR method on finite and noisy data sets. 



Figure g 



5. The Algorithm and Associated Approximation Problems 

So far, we have considered the analytical representations of /(x) and g{y)- However, in 
practice, we will only have a finite number of samples of each variable. We can use these 
samples to construct F{x) and G{y) and then perform either a tabular or an analytical 
approximation to (|8|). 

First, we sort the X and Y values. These sorted values give us an approximation to F{x) 
and G{y). For example, if Xj is the z*^ smallest value in Nx measurements of X, then an 
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estimate of F{x) is 



Similarly, we estimate G{y) as 



F-(..) = ^ (23) 



G'fe) = Jf- (24) 



There are more sophisticated methods of estimating these distributions, such as kernel 



estimators [Hardle, 1990|, if the need arises. 

Henceforth, we will only treat the case u'{X) > 0, but the interested reader can easily 
derive the u'{X) < case in a similar fashion. To obtain u{X) for a particular X, we find i 
such that 

Xi<X< Xi+i. (25) 

Next we find ji and j2 such that 

G*{yj-,) < F*{x,)<G*{yj,+i), (26) 
G*{y,,) > F*(xi+i) > G*(y,,„i). (27) 



We then have an estimate of Y 



Y^yii±y2i, (28) 



By determining a Y for each sample of X, we achieve a tabular definition of y = u{X). We 
have depicted the mapping process and the uncertainty for the bimodal example in Figure ^. 
We have chosen artificially small datasets of = 15 and Ny = 25 to illustrate the estimation 
effect. 



The approximate uncertainty Ay in the Y estimated from (28) is given by 

^y^Vn^yn^ (29) 

We can rewrite (29) in terms of G*~'^ as 
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This expression contains a first order estimate of tlie derivative of G* ^ which, using (A4), can 
be expressed in terms of g{y) as 



{j2-n)/Ny 



dcy^ _ _j_ 
dy J g{y) ' 



(31) 



Therefore (50) can be expressed as 



Ay 



J2 - Ji 



(32) 



Rewriting (p6|) and (p7D using (|2^) and (p4|), we have 



3i_ 

Ny 

h_ 



< 



> 



i + 1 



Therefore 



which leads us to 



Ay ~ 



1 



(33) 
(34) 

(35) 
(36) 



Here, accounts for the samphng effect. This relationship implies that in the rarified regions 
of the Y distribution, where g{y) is small, the estimation error is large. It also suggests that, 
to first order, increasing the Y sample size Ny is not as useful in reducing Ay as would be 
increasing the X sample size N^- However, the uncertainty in x is also important because the 
total uncertainty in x-y space is AxAy. By a derivation similar to that of Ay, we have 

1 



Ax ~ 



2f{x)Ny 



for a total uncertainty of 



(2Ax)(2Ay) 



1 



(37) 



(38) 



f{x)g{y)N^Ny 

To improve the overall quality of the reconstruction of y = u{X), we would like both and 
Ny to be as large as possible. 
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6. The SAR on Noisy Data 

Another consideration for the implementation of the SAR method is the effect of noise. 
Until now, we have assumed that there is no noise in our measurements of X and Y . However, 
in practice, we always encounter noisy data, and we want to be sure that the SAR does not 
become invalid under typically noisy conditions. In a standard regression, where simultaneous 
values of X and Y are known, a least-squares approach can be used to determine u{X) from 
noisy X and Y . We will attempt to demonstrate the effect of noise on the SAR method by 
simulating a noisy version of the meteorological example. We generate 100 noisy samples from 
the distributions f{x) and g{y) given in (H) and (|10|). The noise distributions are chosen to be 
unbiased Gaussians with standard deviations rjx and r]y for X and Y . For now, we choose r]x 
and r]y to be 25% of the standard deviations cTx and cTy of X and Y . We can fit the noisy data 
with logy = aX + log/3. We perform two such fits: a standard least-squares regression on the 
(X, logy) pairs and least-squares regression on the {X,\ogu{X)) pairs produced by the SAR 
method described in (^). For this parametric example, a maximum likelihood estimation of 
a and j3 would probably outperform the least-squares approach, but we will compare to the 
more familiar regression for this illustration. 

Ideally, q = 1 and /? = 1, but, for the noisy data, the two regressions give 



u{x) = 1.21e 



Q.88x 



U[X 



1.14e 



0.92a; 



Standard Regression, 



SAR. 



(39) 
(40) 



The SAR fit is significantly better than the standard regression. In the future, it would be 

interesting to study how this depends on the type of noise and the form of u{X). In Figure ^, Figure ^ 

we see a graphical depiction of the noisy data and the two fits. Both fits lie very close to the 

true u{X) curve compared to the noisy data, however there is a clear improvement with the 

SAR fit. 

To understand better the effect of noise, we repeat the above simulation 5000 times to 
obtain a distribution of a for each fitting approach. These distributions are plotted in Figure |^. Figure [7| 
It is clear that both methods provide biased estimates of a. The SAR method produces a 
smaller bias, but we would still like to know how that bias depends on the noise amplitude. 
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We can test this dependence by finding the bias for various noise/signal ratios r. We will 
choose the same r for X and Y, such that 



fx 



a,. 



(41) 



So far, we have only tested r = 0.25, but now we will test a full range from r = to r = 1.2. 
In Figure ^, we have plotted the median estimated a versus r. We see that for small noise, the 
estimate quality is high, but, as r approaches 1, the estimation fails. The a estimated by the 
SAR method is generally of higher quality than the estimate from the standard regression. For 
relatively large noise amplitude neither regression method produces quality estimates of a. It is 
clear that, while the derivation of the SAR method assumes noiseless data, our implementation 
of the SAR method is at least as robust to noise as is the traditional least-squares regression. 
The SAR appears to be reliable when the noise amplitude is small compared to the variability 
of the data sample. 



Figure § 



7. An Example from Space Physics 

Finally, we would like to demonstrate the SAR method on a real problem from space 
physics. The GOES 8 geosynchronous spacecraft measures, among other things, the flux of 
electrons with energies above 2 MeV. The spacecraft orbits the Earth once per day. The 
electron populations at geosynchronous orbit are organized by the position of the Sun relative 
to the Earth, which we identify as local time. Owing to the asymmetry of the Earth's magnetic 
field in space, as the spacecraft passes through different local times, it measures slightly 
different parts of the radiation belts. Because the relativistic electron density varies smoothly 
with altitude and the electrons themselves make slightly elliptical orbits every few minutes. 



hour averaged fluxes at all locations around geosynchronous orbit are well correlated \Li et 



al., 1997|; we therefore expect a monotonically increasing function Y = u{X) relating fluxes X 



measured at one local time It^ to fluxes Y measured at another local time Ity. We can estimate 
the flux at Ity from a measurement made by the spacecraft at Itx if we can determine u{X). 
The probability distributions of electron measurements at every local time at geosynchronous 
are relatively stationary in time; that is, the distribution of measurements in one year is 
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roughly equivalent to the distribution of measurements in any other year. Therefore, we can 
estimate F{x) and G{y) using historical measurements of X and Y, and we can use the SAR 
method to reconstruct u{X). We will assume Itx is local dawn and Ity is local noon. 

We have obtained GOES 8 measurements for 1998 from CDAWeb (http:/ /cdaweb. gsfc.nasa.gov/) 



\ McGuire et al, 2000 1. We calculated hourly averages and grouped them into 1-hour bins near 
local dawn and local noon. This gives us about 360 samples at each location, but none that 
are simultaneous because the spacecraft is only at one location at a time. Because electron 
measurements tend to be heavily biased toward low values, we will use the Complementary 
Cumulative Distribution Functions -F>(x) = 1 — F{x) and Gy{y) = 1 — G{y). In terms of these 
functions, for a monotonically increasing n(X), we have 

u{x) = G-\F{x)) = G-\F^{x)). (42) 



Figure |9| shows the constructed FZ,{x) and Gy{y). We can fit both distributions with the Figure ^ 
same analytical form: 

F*(x) e"^^ (Dawn), (43) 
G>(y)~e-\^ (Noon). (44) 



Assuming an increasing u{X), we use ( ^2| ) to arrive at an analytical form for u{X): 

u{x) = gi-\f;{x)) 

= 533(-logF;(X))2 = _X 

= 1.74X. (45) 

The non-parametric SAR mapping is shown in Figure |l^ to be nearly a power-law. We have 
determined an analytical fit to be 

Y = u{X) = (1.8 ± 0.4)Xi-0°±°-0^ (46) 



Figure |TD| 



This fit is in agreement with the function u{X) = 1.74X derived in ( pBD above from 
the implementation of the SAR method using the parameterizations ( ^3|) and (^) of the 
cumulative distributions. 
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The fact that the exponent in (^6[) is very nearly 1 indicates that the densities at dawn 
and noon change in fixed proportion to each other, even as the radiation belts are filled during 
geomagnetic activity. If we imagine the electron phase-space structure to be /(r, t7, i), then we 
can state the proportionality as 

f{fi,Vi,t)ocf{f2,V2,t). (47) 

This relationship suggests a simple separation of variables: 

f{f,v,t) = fif,v)N{t), (48) 

where /(r, v) represents a phase space shape function, and N{t) represents the varying global 
relativistic electron content of the geosynchronous region. 

The parameterizations (|43| ) and (^^ together with the corresponding prediction (|45|) 
validated by the direct non-parametric implementation of the SAR method giving (^6|) suggest 
in addition a simple and useful representation of the heavy tail structure of the distribution 
of electron fiuxes in terms of stretched exponentials. Such distributions have been found 
to parameterize a large variety of distributions found in nature as well as in social sciences 
[Laherrere and Sornette, 1995]. They present a quasi-stable property \Sornette et al., 



2000| ; pornette, 2000|| and can be shown to be the generic result of the product of random 



variables in the "extreme deviation" regime [Frisch and Sornette, 1997] 



So far, we have only determined the mapping from local dawn to local noon. It may 
also be necessary to allow u(X) to vary with magnetic activity level. The magnetic indices 
Dst and Kp measure the intensity of the magnetospheric ring current and the variability of 
magnetospheric currents, respectively \ Mayaud, 1980| . We can create different mappings 



u{X; Dst, Kp) for each of several bins of geomagnetic indices; such binning would organize 
the data by the state of the system, reinforcing the assumption that each u{X; Dst, Kp) is 
monotonic and time invariant. Using the SAR method, we can find mapping functions from 
every local time to every other local time, depending on geomagnetic activity, as necessary; 
this allows us to reconstruct the fiux around the entire orbit at any time based only on the 
single measurement made by GOES 8. If we produce fiuxes around the entire orbit every 
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hour, we can view spatial and temporal variations separately. In particular, if we reconstruct 
a time series of hourly fluxes at a fixed local time, we can perform various time series analyses 
that will not be influenced by the spatial variations seen in the measured time-series. This 
investigation will be reported elsewhere. 

8. Discussion 

We have shown that it is possible to accurately determine the function relating two 
variables even when they are not measured simultaneously. Specifically, we were trying to 
map energetic electron fluxes between different local times at geosynchronous orbit. However, 
we believe that our solution may be useful to other researchers whose data are not taken 
simultaneously. We developed a technique. Statistical Asynchronous Regression (SAR), that 
uses the statistical distributions of two variables to determine the unique monotonic function 
that can map one distribution onto the other. Because the SAR technique only works when 
there is a monotonic relationship between the two quantities, it should only be applied to 
quantities that are believed to be highly correlated with each other. We caution that the SAR 
technique will produce a relationship for any two quantities, regardless of whether they are 
actually related. It is particularly inappropriate to use the SAR to describe chaotic systems, 
which generally arise from non-monotonic behaviors. Also, when the noise amplitude is a 
substantial fraction of the data sample variability, we do not expect the SAR to give reliable 
results. 

To illustrate the SAR technique when the two distributions are known analytically, we 
have provided several examples of common distributions. We have shown that the SAR 
technique can recover the underlying relationship of the two quantities even when one 
distribution diverges or has more than one local maximum. We have provided a simple 
algorithm for implementing the SAR. We derived simple expressions for the uncertainty in the 
estimated relationship between the two quantities. To ensure that the technique is robust for 
noisy data, we have simulated two noisy variables with a known relationship and determined 
how well the SAR technique recovers that relationship; the SAR performs than a least squared 
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error regression, which requires simultaneous measurements of both quantities. While we 
expect that ultimately most scientists will wish to fit u{X) to some parametric form, we feel 
that it is important that the SAR does not require us to assume a parametric form a priori. 

For those wishing to apply the SAR technique to problems where u'{X) passes through 
zero, we offer the following strategy: if the u'(X) = occurs at known X and Y, then the SAR 
technique is perfectly valid in bins of X and Y constrained to be between the zeros of u'{X). 
In this way, the SAR would provide a piecewise form of u{X). 

In closing, we would like to suggest some areas that might benefit from the SAR approach. 
In modeling tectonic deformations, it is useful to quantify the balance of deformation 
accommodated by different faults in a complex network \Cowie and Scholz, 1992|. For an 
individual fault, we often can measure only its length or its offset. Relying only on faults with 
both length and offset known would exclude many useful measurements. However, the physics 
of tectonic deformation leads us to expect a monotonic relationship between fault length and 
offset. In this case, the SAR technique would allow us to regress fault length against fault 
offset, using all of the available measurements. Similarly, for individual earthquakes we often 
know only one of seismic moment and energy released [ Mayeda and Walter, 199^ ; the SAR 
technique would allow us to regress all the available measurements rather than only those from 
earthquakes with both moment and energy known. We hope that the ideas presented here will 
assist those who need to relate non-simultaneous measurements. 



Appendix: The Change of Variables Theorem 

The SAR method relies heavily on the change of variables theorem from probability 
theory. The following derivation will be instructive to those not familiar with the manipulation 
of probabilities. 

We will use the notational style P[X < x] to denote the probability that any sample from 
the population of X will be less than or equal to some threshold x. The formal definitions of 
the probability density functions (PDFs) and cumulative distribution functions (CDFs) for X 
and Y are: 
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f{x)dx = P[x<X <x + dx], (Al) 

giy)dy = P[y<Y<y + dy], (A2) 
F{x) = P[X <x]= r f{x')dx', (A3) 

J—oo 

G{y) = P[Y <y]= r g{y')dy' . (A4) 

.'— oo 

By definition f{x) and g{y) are non-negative, and F{x) and G{y) reach 1 at +00. For most 
purposes, f{x) and g(y) are finite, continuous functions, and we will operate under that 
assumption. Accordingly, F{x) and G{y) are monotonically increasing, invertible functions. 

We assume there is a continuous function u{X) that provides the Y that corresponds to a 
given X, 

Y = u{X). (A5) 

We will occasionally replace Y and X with y and x, but this should not worry the reader, as 
the function u has the same meaning regardless of its argument. This function must also be 
monotonic: 

u'{X) + for ah X. (A6) 

Not only does this imply that u{X) is unique and invertible, but it also implies that the sign of 
n'(X) must be either always positive or always negative. Strictly speaking, may vanish 

at isolated points, so long as it only touches, but does not traverse, zero. 
We can write v!{x) as 

= hm — ^ —. (A7) 

If n'(a;) is positive for all values of x, then Ax > implies that u{x + Ax) > u{x). Formally, 

with X + Ax replaced by X, we state this as 

^i'(a;) > {X>x^ u{X) > u{x)}. (A8) 

By similar reasoning, 

u'{x) < {X >x^ u{X) < u{x)}. (A9) 
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For the case u'{x) > 0, we can therefore replace the inequality in ( [A^ ) according to ( A8) 
to arrive at 

F{x) = P[X <x]= P[u{X) < u{x)]. (AlO) 
Using ([A4[) and ( [A5D , we have 



F{x) = P[Y < u{x)] = G{u{x)). 



(All) 



For the other case, u'{X) < 0, we can use (A9) similarly to replace the inequality in (A3), 
which gives 



F{x) = P[X <x]= P[u{X) > u{x)] 
= 1 - P[u{X) < u{x)]. 



(A12) 



For a finite, continuous distribution f{x), P[u{X) < u{x)] = P[u{X) < u{x)]. Therefore, we 
can apply ( |A4| ) and (A5) to ( A12| ) to arrive at 



F{x) = l-P[Y < u{x)] = 1 - G{u{x)). 



(A13) 



By differentiating ( |A11 ) and ( [A13| ), we arrive at 



f{x)dx = g {u{x)) u'{x)dx for n'(x) > 0, 
f{x)dx = —g{u{x)) u'{x)dx for u'{x) < 0, 



(A14) 
(A15) 



or, equivalently, 

f{x)dx = g{u{x))\u' {x)\dx 

= 9{y)\t\dx = g{y)\dy\. (i 

Probability is conserved under a change of variables. This is the change of variables theorem, 
and it is depicted graphically in Figure |^. 
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Figure 1. Probability densities for X and Y are plotted outside the respective axes. The rela- 
tional function Y = u{X) provides the scaling from XioY. Consistent with the conservation 
of probability, the shaded regions have equal area. 

Figure 2. Cumulative distribution functions are plotted for X and Y on the horizontal axis. 
Following the dashed line, one can easily determine what value of Y corresponds to a given X. 
Figure 3. In the same format as Figure 0, this is a depiction of the mapping from a bimodal 
to a Gaussian. The SAR method easily handles the bimodal /(x) and the highly non-linear 
u{x). 

Figure 4. In the same format as Figure [l|, this depicts the mapping from a stretched exponential 
to a Gaussian. The divergence in f[x) does not prevent the SAR method from recovering u{x). 
Figure 5. The constructed F*[x) and G*{y) are plotted on the same horizontal axis. We 
have assumed only 15 samples from X and 25 samples from Y. The width 2Ay represents 
the uncertainty in the estimates of y = u{X). The estimation error grows in the tails of the 
distributions, owing to under-sampling of the low probability density. 

Figure 6. Two approximations to the true u{X) show the effect of noisy samples. In this 
simulation, the SAR approximation is actually closer to the true u{X) than a simultaneous 
regression. 

Figure 7. The distributions of values of a in u{x) = /3e"^ obtained from two regression 
methods. For this example, the SAR method typically produces a better a than a standard 
simultaneous regression. 

Figure 8. The median estimated a in u{x) = f3e"^ for two regression methods. The SAR 
estimate quality drops more slowly than that of the simultaneous regression. 
Figure 9. This plot depicts estimated complementary CDFs for X and Y . Note that the 
vertical axis is logarithmic and the horizontal axis is to the ^ power. The solid lines represent 
the tabular forms of the CDFs and the dashed lines depict the analytical fits with equations 
(ID and il). 
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Figure 10. The figure shows the tabular and analytical representations of the mapping func- 
tion from dawn flux to noon flux. It is impossible to obtain this function using simultaneous 
measurements because GOES 8 is never both at dawn and at noon. The crosses through each 
dot have been exaggerated to indicate ±3 Ax and ±3Ay as calculated by (^) and (^). The 
plot indicates a simple proportional mapping from X to Y, which is physically very reasonable. 
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